AI has become a big thing over the last couple of years, but the question remains, who’s the best? We at Carleton Testing chose to figure this out by putting Gemini, ChatGPT, Grok and Deepseek up against each other in categories like Math, Writing, Interpretation, and overall work Flow. Read more to find out.

Math 1: Simple Calculations

One of the many categories that most people use AI for is math. When looking at simplistic math, like (4+2*4-9)/3 +53, AI handles it great. When testing these AIs, we allowed them to take time and think if they had that option. Starting with Gemini, which we ranked 2nd in the group. Gemini earned this rank for explaining each of its steps with deconstructing of the equations, which, along with the formatting, is very confusing for many. The next AI that we tested was ChatGPT. ChatGPT showed its steps with simple explanations, and, because it deconstructed the equations less than Gemini, we ranked it 1st in this category. The third AI that we tested was Grok, which was ranked last for math. When given the chance to think, it tried every method that it could to check and double-check its answer, taking up three pages just for thinking. The answer was given using three main steps, but when restating the questions, the formatting was different than that which was used for the steps. The overall formatting for Grok was just distracting from the overall purpose of the chat, and it's for these reasons that it got its ranking. The final AI that we tested was Deepseek. Deepseek chose four main steps and gave easy-to-understand explanations for them. However, when put against Gemini or ChatGPT, the speed and consistency of the answers are just a roll of the dice. And thus, Deepseek is rated 3rd for this group. Overall, all the AIs got the correct answers, how they got there, and the rate at which they did differed for each AI. With all that, ChatGPT had 1st, Gemini 2nd, Deepseek 3rd, and Grok in 4th.

Math 2: Algebra (Zero Product Property)

While it's great that it can do simple maths, there are still many more people who need help with more difficult and/or more complicated equations. To test out more complicated equations, we asked the following prompt: “3x^2=15x. Solve using the zero product property”. Again, while testing these AIs, we allowed them to take time (think), but not research. Kicking it off with Gemini, its answers to the questions were done in a few, understandable steps. This time, the explanations and formatting were clean and easy to follow. The final answer was correct, but Gemini was awarded 3rd in this category as it didn’t check the answer. Following this, ChatGPT was awarded 2nd. It gave the answers in a controlled and neat way, giving explanations for its couple of steps. The final answer was given correctly, but like Gemini, it didn’t check the answer. Coming in first in this category was Grok. Grok, while the answers were correct and the steps given, it was crowded and slightly harder to follow. What gave it its edge was the fact that it checked its answers and proved it was correct. Coming in last place was Deepseek, who explained the least per step, and not everything given was clear on how it fit in, but the final answer was correct. All the bots are capable of this Algebra math, but how they show it is different. To finish this off, Grok comes in 1st, ChatGPT comes in 2nd, Gemini trailing behind at 3rd, and Deepseek at 4th.

Writing: Following Instructions and Clarity

As we have shown that these AIs can complete math, the next area where we tested AI was writing. When testing, we were looking for answers to be grammatically correct, and have the correct content, as well as making sure that the AIs followed the given instructions and showed sources. For this test we counted the citations in the word count for the following prompt, “Write a simple 100-word paragraph about the differences between AI reinforcement learning and typical human learning. Provide sources and cite them parenthetically.”. When Gemini 2.0 Advanced reasoning was given this prompt, it gave us a 108 word paragraph, as well as a separate bibliography. The feeling of Gemini’s writing was formal and certain, and made sense, but the answer was not given in the simplest of wording. ChatGPT free, was awarded 2nd, for giving us 105 words, citing in text and in a separate bibliography. ChatGPT’s writing felt more human and conversational, not so formal and certain. Following this, we had Grok. Grok stayed under the 100 word limit, giving us 90 words. Yet again, there was more thinking than answer in its response, and the time showed it. The vocabulary used was much higher level than requested, and while cited in the text, no bibliography was given. Finally we had Deepseek. Deepseek, non-deepthink, was awarded 3rd, as it also stayed under the limit at 99 words. The response time was slow, and no bibliography was given. Overall, the word count was close, but not fully followed, as was the requested language. The topic was stuck to and expanded on. Ultimately, Gemini was awarded 1st, ChatGPT 2nd, Deepseek climbed back up at 3rd, and Grok was back at 4th.

Understanding: Image Interpretation

Now that we have tested what AI can do, we need to check if they can remember what they have said. To test understanding in the AIs, we gave them an image generated by Google Whisk and asked them if the image was AI generated or not and to explain why. The image gave hints at AI generation, but is still an image that can be glanced over by the human eye. We then did the same test with a real image. We then asked for a comparison between the two images. Testing started with Gemini, which had issues, but for a good reason. Gemini has privacy features that scan the image and block access to it if it interferes with their code of conduct, for example people in images. Even with this, the images that Gemini does look at, it has the best recognition of what is in that image. When fed the image of what looked like only one turtle, it noticed features that could point to the conclusion that there were actually two. Gemini gave quick and easy explanations for the questions and the steps, but it took two random images to make a comparison instead of what the conversation had been about the whole time. Even with that, Gemini was awarded 1st. ChatGPT came up just behind, in 2nd. This time, there were no privacy features, but it mistook the AI image as real, stating that there were natural discrepancies. However, when looking at the real image, which it classified correctly, it only noticed one of the two turtles. Both explanations were fulfilling and gave a categorized overview of the two images when asked. The next AI was Grok, which was awarded 3rd. Grok found the first image to be real and gave 5 good reasons why, with explanations. Like ChatGPT, Grok saw one of the two turtles and correctly identified it as natural. Grok also gave 5 main points when comparing the images with plenty of detail, but it was slightly more difficult to follow than ChatGPT. The final AI that we tested was Deepseek. Deepseek was only able to extract words and text, not allowing it to understand images, so we had to award Deepseek 4th. Overall the AIs did a good job in explanations, but all had trouble with understanding the images. This leaves us with Gemini in 1st, ChatGPT in 2nd, Grok in 3rd, and Deepseek unfairly left in 4th.

Flow of Use: App and Webpage Experience

The most important feature of AI is how it can be incorporated into everyday life. When comparing the AIs, we looked at the phone app, along with their websites. The main areas compared were ease of understanding, ecosystems, and flow. When looking at Gemini, it has both a webpage and phone app. The phone app feels like it was designed from the ground up to be used on a handheld device, such as having the settings and chat history all within one click. Both Gemini’s app and webpage have great interfaces and features. Both the app and webpage can run different models, which have access to canvas, deepresearch, along with allowing pdf uploads with other files. Not only that, but it also integrates into the Google ecosystem, which allows you to let it access your google drive, gmail, and calendar. The main difference is that the app has Gemini live, where you can speak with the AI in a human-like fashion. Comparing this, we gave the app 1st, and the webpage 1st. Next up was ChatGPT, which was rated 2nd in the phone app as well as 2nd in the webpage. When looking at the phone app, the interface was nice and easy, but there were more steps to find your settings and profile. The app and webpage both have access to the same features, such as live chat, allowing desktop users to have this experience, but the live chat feels more rough when using it as compared to Gemini. ChatGPT also has the ability to upload PDF’s, along with other files. Grok lost any chance of a comeback in this category. In both the phone app and webpage, the user interface is disorganized and distracting. Neither platform has live chat features, but they do have more comprehensive AI control. The downside with the upload feature on Grok is that it only supports images, not PDFs or documents in the same upload, and for these reasons, Grok was rated 4th in both the app and webpage. The final AI that we tested was Deepseek, which was awarded 3rd in both the phone app and webpage. There was nothing special about either, but while it was slower, it was reliable. After looking through all of this, the differences in the AIs is staggering, with Gemini in first for both phone app and webpage, chatgpt's app and webpage in 2nd, Deepseek's platforms both in 3rd, and Grok falling to the bottom, with both the app and webpage in 4th.

Results Table

Category Gemini ChatGPT Grok Deepseek
Math 1 2 1 4 3
Math 2 3 2 1 4
Writing 1 2 4 3
Interpretation 1 2 3 4
Flow (App) 1 2 4 3
Flow (Webpage) 1 2 4 3
Average Place 1.5 1.8 3.3 3.3

Conclusion

To sum up, after putting these AIs to the test and averaging their placement, in first with an average place of 1.5 is Gemini, in second with an average place of 1.8 is ChatGPT, and tied in 3rd with an average place of 3.3 are Grok and Deepseek. All these AIs have their strong suit, but Gemini being linked to Google gives it the edge over the others in this competition, and is the one that I would recommend for all users. Hope you enjoyed this week's paper from the Carleton Testers.